Improve The Linear Regression Model in Bioinformatics Using Text Mining

ثبت نشده
چکیده

Linear regression is a commonly used approach in bioinformatics. One of the main challenge with applying linear regression in bioinformatics is that the number of regression weights needed to be determined is often at least one order of magnitude larger than the number of data points available for training. This sparse data problem often reduce the reliability in determining regression weights, and leads to a significant degradation in the regression error. In this paper, we presented a text mining approach that can effectively reduce the regression error given the sparse data. The main idea is to extract the profiles of keywords from the research articles that describe the properties of the genes involved in the regression model. These keyword profiles of genes are then used to construct the similarity measure of the genes, which will be used to regulate the assignment of regression weights. More specifically, genes with similar keyword profiles are likely to be assigned similar weights. One of the key challenges in exploiting text information in regression models is that many of the extracted keywords may not be relevant to the biological process, and as a result, the similarity measurement of genes may not reflect the true relationship among the genes in a particular biological process. To resolve this problem, we present a full Bayesian framework that automatically determines the importance of key words in determining the similarity of genes in their roles to a given biological process. Empirical studies with a real biological dataset show that the proposed Bayesian framework is effective in exploiting the text profiles of genes to reduce the regression errors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of non-linear regression and soft computing techniques for modeling process of pollutant adsorption from industrial wastewaters

The process of pollutant adsorption from industrial wastewaters is a multivariate problem. This process is affected by many factors including the contact time (T), pH, adsorbent weight (m), and solution concentration (ppm). The main target of this work is to model and evaluate the process of pollutant adsorption from industrial wastewaters using the non-linear multivariate regression and intell...

متن کامل

Improving biological activity prediction of protein kinase inhibitors using artificial neural network and partial least square methods

Introduction: Protein kinase causes many diseases, including cancer; therefore, inhibiting them plays an important role in the treatment of many diseases. Traditional discovery inhibitors of this enzyme is a time-consuming and costly process. Finding a reliable computer-aided drug discovery tools which can detect the inhibitors will reduce the cost. In this study, it is attempted to separate ki...

متن کامل

Improving biological activity prediction of protein kinase inhibitors using artificial neural network and partial least square methods

Introduction: Protein kinase causes many diseases, including cancer; therefore, inhibiting them plays an important role in the treatment of many diseases. Traditional discovery inhibitors of this enzyme is a time-consuming and costly process. Finding a reliable computer-aided drug discovery tools which can detect the inhibitors will reduce the cost. In this study, it is attempted to separate ki...

متن کامل

Prediction of Blasting Cost in Limestone Mines Using Gene Expression Programming Model and Artificial Neural Networks

The use of blasting cost (BC) prediction to achieve optimal fragmentation is necessary in order to control the adverse consequences of blasting such as fly rock, ground vibration, and air blast in open-pit mines. In this research work, BC is predicted through collecting 146 blasting data from six limestone mines in Iran using the artificial neural networks (ANNs), gene expression programming (G...

متن کامل

Prediction of ultimate strength of shale using artificial neural network

A rock failure criterion is very important for prediction of the ultimate strength in rock mechanics and geotechnics; it is determined for rock mechanics studies in mining, civil, and oil wellborn drilling operations. Also shales are among the most difficult to treat formations. Therefore, in this research work, using the artificial neural network (ANN), a model was built to predict the ultimat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006